{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Loading Data Tutorial\n",
    "\n",
    "MLDB operates on data via [Datasets](../../../../doc/#builtin/datasets/Datasets.md.html), which can be created and populated in three different ways:\n",
    "\n",
    "1. You can create a mutable Dataset and insert data row by row via REST.\n",
    "1. You can create a Dataset from an existing file.\n",
    "1. You can create a Dataset by running a [Procedure](../../../../doc/#builtin/procedures/Procedure.md.html)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a dataset via REST by POSTing rows\n",
    "\n",
    "Creating a Dataset is a simple REST call. The notebook cells below use `pymldb`'s `Connection` class to make [REST API](../../../../doc/#builtin/WorkingWithRest.md.html) calls. You can check out the [Using `pymldb` Tutorial](../../../../doc/nblink.html#_tutorials/Using pymldb Tutorial) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from pymldb import Connection\n",
    "mldb = Connection()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's create a dataset called `example`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<strong>PUT http://localhost/v1/datasets/example</strong><br /><strong style=\"color: green;\">201 Created</strong><br /> <div class=\"highlight\"><pre style=\"line-height: 125%\"><span></span>{\n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;status&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;columnCount&quot;</span>: <span style=\"color: #0000dd\">0</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;rowCount&quot;</span>: <span style=\"color: #0000dd\">0</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;valueCount&quot;</span>: <span style=\"color: #0000dd\">0</span>\n",
       "  }, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;config&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;type&quot;</span>: <span style=\"color: #0000dd\">&quot;sparse.mutable&quot;</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;example&quot;</span>\n",
       "  }, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;state&quot;</span>: <span style=\"color: #0000dd\">&quot;ok&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;type&quot;</span>: <span style=\"color: #0000dd\">&quot;sparse.mutable&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;example&quot;</span>\n",
       "}\n",
       "</pre></div>\n"
      ],
      "text/plain": [
       "<Response [201]>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.put('/v1/datasets/example', { \"type\":\"sparse.mutable\" })"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's all there is to it, and now we can add some rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<strong>POST http://localhost/v1/datasets/example/commit</strong><br /><strong style=\"color: green;\">200 OK</strong><br /> "
      ],
      "text/plain": [
       "<Response [200]>"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.post('/v1/datasets/example/rows', {\n",
    "    \"rowName\": \"first row\",\n",
    "    \"columns\": [\n",
    "        [\"first column\", 1, 0],\n",
    "        [\"second column\", 2, 0]\n",
    "    ]\n",
    "})\n",
    "\n",
    "mldb.post('/v1/datasets/example/rows', {\n",
    "    \"rowName\": \"second row\",\n",
    "    \"columns\": [\n",
    "        [\"first column\", 3, 0],\n",
    "        [\"second column\", 4, 0]\n",
    "    ]\n",
    "})\n",
    "mldb.post(\"/v1/datasets/example/commit\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So now we have a little bit of data in our dataset. Let's check."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>first column</th>\n",
       "      <th>second column</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>first row</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>second row</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            first column  second column\n",
       "_rowName                               \n",
       "first row              1              2\n",
       "second row             3              4"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"select * from example\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a Dataset from a file\n",
    "\n",
    "Now we'll load a dataset from a CSV file. Let's start by creating a simple CSV file to import:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "mkdir: cannot create directory ‘/mldb_data/datasets’: File exists\r\n"
     ]
    }
   ],
   "source": [
    "%mkdir /mldb_data/datasets"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Overwriting /mldb_data/datasets/sample.csv\n"
     ]
    }
   ],
   "source": [
    "%%writefile /mldb_data/datasets/sample.csv\n",
    "a column,another column\n",
    "a,b\n",
    "c,d"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can import this dataset with a simple procedure."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<strong>PUT http://localhost/v1/procedures/import_example2</strong><br /><strong style=\"color: green;\">201 Created</strong><br /> <div class=\"highlight\"><pre style=\"line-height: 125%\"><span></span>{\n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;status&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;firstRun&quot;</span>: {\n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;runStarted&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.3476582Z&quot;</span>, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;status&quot;</span>: {\n",
       "        <span style=\"color: #333333; font-weight: bold\">&quot;numLineErrors&quot;</span>: <span style=\"color: #0000dd\">0</span>\n",
       "      }, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;runFinished&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.3526092Z&quot;</span>, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.347551Z-5bc7042b732cb41f&quot;</span>, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;state&quot;</span>: <span style=\"color: #0000dd\">&quot;finished&quot;</span>\n",
       "    }\n",
       "  }, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;config&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;params&quot;</span>: {\n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;outputDataset&quot;</span>: <span style=\"color: #0000dd\">&quot;example2&quot;</span>, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;runOnCreation&quot;</span>: <span style=\"color: #0000dd\">true</span>, \n",
       "      <span style=\"color: #333333; font-weight: bold\">&quot;dataFileUrl&quot;</span>: <span style=\"color: #0000dd\">&quot;file:///mldb_data/datasets/sample.csv&quot;</span>\n",
       "    }, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;type&quot;</span>: <span style=\"color: #0000dd\">&quot;import.text&quot;</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;import_example2&quot;</span>\n",
       "  }, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;state&quot;</span>: <span style=\"color: #0000dd\">&quot;ok&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;type&quot;</span>: <span style=\"color: #0000dd\">&quot;import.text&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;import_example2&quot;</span>\n",
       "}\n",
       "</pre></div>\n"
      ],
      "text/plain": [
       "<Response [201]>"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.put('/v1/procedures/import_example2', {\n",
    "    \"type\":\"import.text\",\n",
    "    \"params\": {\n",
    "        \"dataFileUrl\":\"file:///mldb_data/datasets/sample.csv\",\n",
    "        \"outputDataset\": \"example2\",\n",
    "        \"runOnCreation\": True\n",
    "    }\n",
    "})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And a query to validate that things got loaded correctly!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>a column</th>\n",
       "      <th>another column</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>a</td>\n",
       "      <td>b</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>c</td>\n",
       "      <td>d</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         a column another column\n",
       "_rowName                        \n",
       "2               a              b\n",
       "3               c              d"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"select * from example2\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating a Dataset by running a Procedure on another Dataset\n",
    "\n",
    "Procedures take Datasets as inputs and can create Datasets as outputs. This is how you can do data cleanup/transformation in MLDB. Here's a simple example with the [`transform` Procedure](../../../../doc/#builtin/procedures/TransformDataset.md.html):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<strong>POST http://localhost/v1/procedures/example/runs</strong><br /><strong style=\"color: green;\">201 Created</strong><br /> <div class=\"highlight\"><pre style=\"line-height: 125%\"><span></span>{\n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;status&quot;</span>: {\n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;columnCount&quot;</span>: <span style=\"color: #0000dd\">1</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;rowCount&quot;</span>: <span style=\"color: #0000dd\">2</span>, \n",
       "    <span style=\"color: #333333; font-weight: bold\">&quot;valueCount&quot;</span>: <span style=\"color: #0000dd\">3</span>\n",
       "  }, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;runStarted&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.3896525Z&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;state&quot;</span>: <span style=\"color: #0000dd\">&quot;finished&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;runFinished&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.3908167Z&quot;</span>, \n",
       "  <span style=\"color: #333333; font-weight: bold\">&quot;id&quot;</span>: <span style=\"color: #0000dd\">&quot;2016-03-22T16:20:51.389521Z-5bc7042b732cb41f&quot;</span>\n",
       "}\n",
       "</pre></div>\n"
      ],
      "text/plain": [
       "<Response [201]>"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.put('/v1/procedures/example', {\n",
    "    \"type\": \"transform\",\n",
    "    \"params\": {\n",
    "        \"inputData\": 'select \"first column\" + \"second column\" as \"transformed column\" from example',\n",
    "        \"outputDataset\": \"example3\"\n",
    "    }\n",
    "})\n",
    "mldb.post('/v1/procedures/example/runs')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>transformed column</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>_rowName</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>first row</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>second row</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "            transformed column\n",
       "_rowName                      \n",
       "first row                    3\n",
       "second row                   7"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mldb.query(\"select * from example3\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "## Where to next?\n",
    "\n",
    "Check out the other [Tutorials and Demos](../../../../doc/#builtin/Demos.md.html)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
